home *** CD-ROM | disk | FTP | other *** search
-
-
- FLYING PACKET RADIOS AND NETWORK PARTITIONS
- IEN #146
- PRTN #292
- Radia Perlman
- Bolt Beranek and Newman, Inc.
- June 1980
-
-
- I. INTRODUCTION
-
- As described in IEN 110, "Internet Addressing and Naming in a
- Tactical Environment", a network can become partitioned into two
- or more pieces. Assuming some of these pieces are still
- connected to the catenet, we would like the catenet to be able to
- efficiently deliver packets to a host in any such piece. Such a
- capability in the catenet could additionally be utilized by a
- scheme for delivering intranet traffic across partitions in a
- partitioned network.
-
- Another problem is known as the flying packet radio problem, in
- which there are two ground PR nets and an airborne PR,
- potentially in radio contact with either or both ground nets.
- The problem is to route internet packets to that airborne PR.
-
- In IEN #120 I presented a design for network partitioning and for
- the flying packet radio problem. This paper differs from IEN
- #120 in several ways:
-
- 1) In this paper there is a simpler solution proposed for finding
- the host/partition correspondence.
-
- 2) In this paper an argument is made for doing the link state
- routing algorithm in a straight per gateway (rather than per net)
- computation. This is more costly, since there are more gateways
- than nets, but it is more straightforward to implement, and is
- more flexible.
-
- 3) A different solution is proposed for the flying packet radio
- problem. The solution in IEN #120 was an easily implementable
- one that could be immediately implemented. The solution in this
- paper depends on the rest of the design being implemented, but is
- a less costly solution.
-
- 4) In IEN #135, Carl Sunshine and Jon Postel present an
- alternative approach to the flying packet radio problem. A
- comparison of the two approaches is made in this paper.
-
- The currently implemented gateway routing algorithm is based on
- the original ARPANET algorithm. To efficiently provide for
- routing to network partitions, routing must be based on a link
- state routing scheme. The necessity for a link state routing
- scheme was demonstrated in IEN #120. In this paper I will merely
- present the design.
-
-
- - 1 -
-
-
-
- II. LINK STATE ROUTING
-
- A "link state" routing scheme is one in which the nodes computing
- the routing have complete knowledge of the state of all the links
- in the network. All nodes monitor the state of their links to
- their neighbors, and report this information to the nodes that
- compute routing. In a totally distributed algorithm, all nodes
- compute routing, so that means that all nodes must broadcast the
- state of their links to every other node.
-
- A link state scheme is currently in operation in the ARPANET. An
- alternative to a link state scheme is the algorithm that used to
- be in operation in the ARPANET, and is currently implemented in
- the gateways. In the old-style ARPANET routing algorithm, nodes
- give to their neighbors a vector of their distance to all
- destinations, and a node compiles its own distance vector by
- taking the minimum distance of its distance to a given neighbor,
- plus that neighbor's distance to the destination. The advantage
- of a link state scheme over the old-style ARPANET scheme is that
- a link state scheme is more flexible, since nodes have more
- information.
-
- The most straightforward link state scheme would be one where
- each gateway computes routes from itself to all other gateways.
- This design was specified in IEN #24 (also known as PRTN #242).
- Let us call that scheme the per-gateway scheme.
-
- In IEN #120 (PRTN #279) I proposed a modification to the
- per-gateway design, wherein gateways computed routes to
- destination networks rather than destination gateways. Let us
- call that scheme the per-network scheme. The per-network scheme
- is computationally less costly for the gateways, since there are
- more gateways than nets. However, there is a problem with the
- per-net scheme. The problem is that in the per-net scheme,
- different costs cannot be assigned to different pairs of gateways
- on the same network. And in networks like the packet radio net,
- or the ARPANET, the delay between very distant gateways on the
- same net can be very different from the delay between close
- gateways on that net. Currently there is no mechanism for
- measuring delays between neighbor gateways, and the cost function
- is the simplest possible -- number of hops. However, at some
- point in the future we might want to use a more sophisticated
- cost function. Thus I recommend abandoning the per-network
- approach and going to the straightforward per-gateway approach.
-
- Currently there are few enough gateways so the per-gateway
- approach would not be a problem. If in the future there are too
- many gateways to make this approach feasible (more than 100),
- there are other approaches that can be taken. For example,
- instead of a totally distributed algorithm, there can be a few
- "routing centers" distributed around the catenet. These routing
- centers would be large enough machines so that they would not
- have space problems with computations involving hundreds of
-
-
- - 2 -
-
-
-
- nodes, and do not have to be gateways, so the time involved in
- computing routes would not degrade gateway forwarding
- performance. This would make the link state scheme less costly,
- since gateways would only have to report the state of their links
- to the routing centers, not to all gateways.
-
- If the number of gateways was truly huge (more than a few
- hundred), it would not be practical even for a large routing
- center to compute routes for a network that large. In that case
- a heirarchical approach, of breaking the net into subnetworks and
- treating the network of subnetworks as an approximation to the
- entire network should be used. This approach has been taken in
- the multistation packet radio design, which is a design to
- accomodate a very large network of PRs.
-
- If it is decided that the capability of assigning different costs
- to different pairs of gateway links is not essential, the
- per-network scheme might be adopted, so I will include the
- description here.
-
-
- III. TERMINOLOGY
-
- 1) neighbor gateways--two gateways attached to the same network
-
- 2) functioning neighbor gateways--neighbor gateways able to
- communicate with each other over their common network
-
- 3) attached network--a network physically attached to a gateway,
- and with which the gateway can communicate directly (not through
- another gateway)
-
- 4) neighbor network of gateway G--an attached network of a
- functioning neighbor gateway of G, excluding attached networks of
- G
-
-
- IV. TABLES TO BE MAINTAINED BY EACH GATEWAY
-
- 1) a list of attached networks--This list is relatively constant
- and is updated by a gateway when it notices a network interface
- is down or for some other reason the gateway is incapable of
- communicating with an attached network. Keeping this table
- updated is solely the responsibility of each gateway, and does
- not require intergateway communication.
-
- 2) a table of all gateways and their attached networks--This
- table is maintained by intergateway communication -- gateways
- give copies of their table 1 to all other gateways. The table of
- all gateways never shrinks (a down gateway is assumed to exist
- but be unreachable).
-
-
-
-
- - 3 -
-
-
-
- 3) a table of link states to neighbor gateways--This table in
- gateway G specifies, for each neighbor gateway G1, over which
- common networks G and G1 can communicate. This table is updated
- by G periodically bouncing packets off each neighbor gateway from
- which it has not recently received traffic. Note that I refer to
- two gateways as neighbor gateways even if they cannot
- (temporarily, hopefully) communicate with each other.
-
- 4) a list of neighbor networks--This list is derived from the
- table of link states to neighbor gateways and the list of
- gateways with attached networks (tables 3 and 2).
-
- 5) total link state--This is a table of all gateways and the
- state of their links to their neighbor gateways. This table is
- compiled from intergateway communication. When a gateway notices
- that its table of attached networks, or its table of link states
- to neighbor gateways (tables 2 and 3) changes, that gateway
- efficiently broadcasts this information to all other gateways in
- the catenet. To minimize numbers of reports when a link is
- flaky, a link on an attached network must be up continuously for
- some amount of time before its state is considered to change from
- down to up and trigger a link state report.
-
- 6) shortest distance matrix--This is a data structure from which
- routing decisions can be made directly. It is computed from the
- other tables. It is described more fully in part V.
-
-
- V. ROUTING COMPUTATION
-
- 5.1 Per-Network Scheme
-
- A gateway, using the tables described above, constructs a
- connectivity matrix whose rows and columns represent networks,
- and whose entries are 1 if any gateways claim to be attached to
- both networks, and infinity otherwise. Then the gateway *'s that
- matrix to construct a shortest distance matrix. (The operation
- "*" consists of "multiplying" a matrix by itself, using the
- operations min and plus instead of plus and times, until the
- result stabilizes. This is a well-known algorithm.) The gateway
- then looks in the shortest distance matrix for the neighbor
- network (or set of such) closest to the destination network, and
- chooses a functioning neighbor gateway (or set of such) attached
- to that neighbor network, for forwarding packets to that
- destination network.
-
- If the cost function assigns different costs to different
- networks, then instead of merely putting a "1" in the
- connectivity matrix where there is connectivity, the gateway does
- the following. If the assigned cost (a constant) of network A is
- C(a) and the assigned cost of network B is C(b), then in the
- connectivity matrix for the entry [A,B], deposit C[a]. In the
- entry [B,A] deposit C[b]. In other words, assign the cost of the
- network you are leaving.
-
- - 4 -
-
-
-
- When a link state report changes the state of an entry in the
- connectivity matrix (remember, all gateways connecting two
- networks have to go down before an entry changes to infinity), a
- gateway must recompute the distance matrix.
-
- This design is a slight modification of the design presented in
- "Gateway Routing", by Radia Perlman (PRTN #242, IEN #24). The
- modification is that the indices of the matrix are networks, not
- gateways. The purpose of this modification is to make the size
- of the matrix smaller, an important modification given that in
- the catenet there are many more gateways than networks. There
- are aspects to the scheme that are irrelevant to a discussion of
- how to solve the network partition problem, such as sequence
- numbers for link state reports, etc. The purpose of this paper
- is to direct a correct approach to the design, and not to present
- an implementation specification. Thus an implementer should read
- PRTN 242 to discover the details of a link state algorithm that
- were not relevant for presentation here.
-
- Note that an alternative to *'ing the matrix is to use the scheme
- that the ARPANET has switched over to, which is a link state
- scheme in which a shortest path routing tree is constructed from
- the connectivity information. The new ARPANET scheme is less
- costly to maintain as links change state. Its disadvantages are
- that it precludes load splitting, probably a very important
- problem in the case of the catenet, and is probably a little
- harder to implement. Since links will not change state very
- often, the author favors the overhead of the matrix *'ing scheme
- over the disadvantages of the ARPANET scheme. However, this
- decision is separable from the rest of the design and can be
- decided either way at a later time.
-
- 5.2 Per-Gateway Scheme
-
- This scheme is more straightforward. The rows and columns in the
- connectivity matrix represent gateways. If different costs are
- assigned to different gateway links on the same network, gateways
- would report the cost of their links to their neighbors in their
- link state reports, and this cost would be deposited into the
- entries in the connectivity matrix.
-
- As in the per-net scheme, the connectivity matrix could be *'ed,
- or the Dijkstra algorithm could be applied.
-
-
- VI. DETECTING THAT A NETWORK HAS PARTITIONED
-
- Now we look at the problem of network partitions. In the design
- presented so far there is enough information for any gateway to
- detect a partitioned network and to isolate groups of gateways on
- each partition: A gateway G knows that network N is partitioned
- if there are two sets of gateways, set Q and set R, such that all
- gateways in both sets report they are attached to network N, but
-
-
- - 5 -
-
-
-
- there are no two-way links between a member of set Q and a member
- of set R via network N. This information is derived
- independently by each gateway from the table of all gateways and
- their attached networks, and from the table of total link state
- (tables 2 and 5).
-
-
- VII. DERIVING A NAME FOR EACH PARTITION
-
- It is necessary to expand the internet header to allow a field
- for identifying a network partition. The reason for this is to
- avoid the necessity for every gateway on a packet's route to
- discover to which partition the packet should be sent.
-
- The partition name must give sufficient information so that every
- gateway can make the proper routing decisions to send a packet to
- that partition, based on its tables of total link state and
- gateways/attached nets (tables 5 and 2).
-
- The following schemes for naming a partition are all done
- independently by all gateways, as opposed to having some central
- authority choose a name and inform all gateways, or having a
- group of gateways decide on a name "by committee".
-
- One method of identifying a partition is to use the name of any
- member gateway of the partition. It will not matter if two
- gateways choose different names for the same partition. Since
- the sets of gateways involved in the network partitions are
- disjoint, any member of the set identifies the set.
-
- Another method is to list (either by an explicit list or a bit
- table) the set of gateways that make up that partition. This is
- unnecessarily descriptive, since the list of gateways is
- derivable from a single member of the set. And it is a less
- robust scheme, because any change to the partition (a gateway
- going down, coming up, or the net partitioning into more pieces)
- can confuse a gateway trying to route to that set of gateways.
- In the first method, if the partition changes, the packet will be
- routed unambiguously to whatever partition the named gateway is
- in. Of course, if the named gateway goes down, the packet
- becomes undeliverable, but that is easier to deal with than
- trying to deliver a packet to a set of gateways that overlaps two
- partitions.
-
- A third method is for each gateway to number partitions from 1 to
- the number of partitions, ordered by, say, the highest numbered
- gateway in each partition. This method uses fewer bits in the
- packet header but is a much less robust scheme. With gateways
- having slightly differing information, partition names have
- different meanings. Also, partitions can switch names suddenly.
- For instance, a net can be partitioned into 2 pieces, numbered 1
- and 2, and, assuming the highest numbered gateway was down, and
- comes up in partition 2, partitions 1 and 2 now switch
- identities.
-
- - 6 -
-
-
-
- Thus the recommended method of identifying a partition is the
- first method.
-
-
- VIII. FIGURING OUT WHICH PARTITION A HOST IS IN
-
- This is the aspect of the design for which I did not find the
- design presented in IEN #120 completely satisfying. Here is a
- better approach.
-
- Important goals are to:
-
- 1) Shield gateways from state information such as which hosts are
- in which partitions.
-
- 2) Shield hosts from the necessity of knowing much about the
- structure of the catenet. In particular, since hosts do not
- receive gateway link state reports, they do not know which
- gateways and links are up, do not dynamically discover new
- gateways and networks, and thus cannot intelligently provide a
- complete source route on a packet. And requirements for
- sophistication on the part of the host means adding that
- sophistication to many different implementations.
-
- The proposed solution is that:
-
- 1) If a gateway receives a packet for a partitioned destination
- network, with no partition name filled in in the packet header,
- that gateway duplicates the packet, sending a copy of the packet
- to each partition. (Subsequent gateways will not duplicate the
- packet because the first gateway would have supplied partition
- names on the packets it sent out.)
-
- 2) Gateways on a partitioned network fill in their IDs in packets
- leaving the partitioned network.
-
- 3) Hosts communicating with a host on a partitioned network can
- either ignore the whole network partitioning issue, or copy the
- partition name from packets returning from the host on the
- partitioned network. If hosts ignore the partitioning issue, the
- cost is duplicated packets. If hosts choose to copy the
- information, they must keep state information per host on that
- partitioned network, and they must notice when that information
- becomes out of date (if packets fail to reach their destination,
- the host should erase its knowledge of which partition the packet
- was routed to, since the other host might have moved to a
- different partition).
-
- One advantage of this design is that gateways can be completely
- sheltered from per-host state information. They already detect
- partitioned networks, so the only added work is duplicating
- packets and filling in partition names. Another advantage is
- that hosts can either be totally oblivious of the whole issue, at
-
-
- - 7 -
-
-
-
- the expense of duplicated packets, or they can, without much
- work, obtain the information of the proper partition for a given
- host. And the decision as to which course to take can be taken
- independently by each host.
-
-
-
- IX. ROUTING PACKETS TO THE CORRECT PARTITION
-
- As stated above, a gateway G, distant from partitioned network N,
- must know which gateways are involved in a partition before G can
- correctly route a packet -- it might have to make a different
- routing decision for one partition than for another one.
-
- When G detects a network has become partitioned into n pieces, G
- must add n-1 rows and columns to its shortest distance matrix,
- i.e., it treats each partition as a separate network. It is an
- implementation detail, and not a difficult one, to ensure that
- the gateway understands the meaning of each row and column. And
- given that the gateway understands the meaning of each row and
- column, it is easy for it to fill in the connectivity matrix from
- its table of total link state. The computation is done exactly
- as in the nonpartitioned case.
-
-
- X. FLYING PACKET RADIOS
-
- In IEN #110, Dr. Vinton Cerf raises the following problem. An
- airborne PR flies above two ground nets, A, and B. At times the
- airplane PR is in radio contact of A, B, both nets, or neither
- ground net. The problem is for hosts in the catenet to send
- packets to the airplane PR. They cannot simply fill in "A" or
- "B" as the destination network because that might be incorrect.
- And if they did somehow know the proper net at the time, higher
- level protocols would get confused by a changing net number, and
- be unable to match packets to an existing connection.
-
- In IEN #120 I presented a scheme for solving this problem that
- could be implemented without the rest of the network partitioning
- design. That scheme involved assigning a virtual net number to
- each airplane PR. Gateways on the ground nets A and B would have
- half gateways associated with each virtual net, that "pinged" the
- associated airplane PR occasionally to see if it was reachable.
- The cost of this solution is traffic overhead in the ground nets,
- from all the pinging, and extra net numbers for each airplane PR.
- However, it is easy to implement.
-
- I will present here a solution that is much cheaper, but depends
- on the rest of the design in the paper being adopted.
-
-
-
-
-
-
- - 8 -
-
-
-
- The solution is that:
-
- 1) A single virtual net number, P, is assigned to include all
- airborne PRs.
-
- 2) Gateways on A and B always report that they are connected to
- "net P" and that their links to their "neighbors" on the other
- ground net, via net P, are always down. (They could report their
- link via net P to be up if some airplane PR connects the two
- ground nets so that they actually can reach the other ground
- gateways, but it is simpler to declare that link always down, and
- it will avoid forcing the airborne PRs to forward much traffic.)
- Gateways on A report their links to the other gateways on A, via
- net P, to be in the same state as the actual links via net A.
- The gateways on net B do likewise.
-
- 3) Thus P will look to the rest of the catenet like a partitioned
- net. Consequently, gateways receiving packets for P with a blank
- partition name will duplicate the packet, sending a copy to each
- "partition" of P, i.e., a copy to net A and a copy to net B. And
- gateways on nets A and B receiving packets from "P", will fill in
- their IDs as the "partition name" of P.
-
- 4) Hosts talking to an airborne PR can either ingore the whole
- problem and let its packets get duplicated, or copy the proper
- "partition name" in packets it receives from the airplane.
- However, since airplanes are liable to switch nets quickly, the
- information is liable to become quickly out of date. Thus it is
- probably better (assuming there are only 2 ground nets) to live
- with the duplicated packets, ensuring delivery (assuming the
- airplane PR is reachable via one of the ground nets).
-
- The cost of this approach is the implementation of all the rest
- of the network partitioning design presented in this paper, plus
- a single virtual net number and duplicated packets to the
- airplane PRs (or hosts copying the information provided in
- packets from the airplane PRs).
-
-
- XI. COMPARISON WITH IEN 135
-
-
- In IEN #135, Carl Sunshine and Jon Postel present an alternative
- approach to the flying packet radio problem. Their approach is:
-
- 1) A virtual net number is assigned for all airplane PRs.
-
- 2) Airplane PRs have the responsibility for ascertaining their
- current network location, and reporting this information to a
- global database. (There can be multiple global databases for
- reliability.)
-
-
-
-
- - 9 -
-
-
-
- 3) Hosts wishing to communicate with an airplane PR request their
- location from the global database, which furnishes them with a
- single level source route to write into the packet header. The
- source route consists of both a net number (as in ground net A or
- ground net B, depending on which net the airplane in question is
- currently in radio connectivity of) and a local address on that
- net, called a "forwarder". The purpose of the "forwarder" is
- merely to fit this into the already existing mechanism of source
- routing, which requires a full internet address. It would be
- most convenient to have as the ID of the "forwarder" the same ID
- as the destination, so that the destination would be net P, host
- XXX, and the source route would be net A (or B), host XXX.
-
- The authors propose this scheme over the one presented in IEN
- #120 (and the one presented here) in order to save the gateways
- from dealing with the problem.
-
- Costs of the scheme in IEN #135 are:
-
- 1) Maintaining a global database is complex and costly.
-
- 2) The airplane PR must ascertain its true net and report this
- information to the global database. There is no mechanism
- currently designed into packet radio networks to make this easy,
- and certainly no mechanism for alerting the airplane PR that it
- is "about to leave" a net, as suggested in IEN #135.
-
- 3) Hosts wishing to communicate with an airplane PR must first
- contact the global database. This is extra code that must be
- implemented in order for the host to communicate at all with the
- airplane PR. And it must be implemented in every host that might
- be in contact with an airplane PR.
-
- 4) Airplane PRs are liable to change nets so quickly that by the
- time the airplane discovers it has changed nets, contacted the
- global database, the host has queried the database, and the host
- has received a reply from the database, the airplane PR has very
- likely changed nets.
-
- The costs of the scheme presented in this paper are:
-
- 1) Implementing a link state routing algorithm for gateways. A
- link state algorithm requires more control traffic and more
- computation than the old-style ARPANET algorithm. It requires a
- recoding of the gateways, since they currently have implemented
- the old-style ARPANET algorithm. However, the extra overhead of
- the link state scheme is not that bad, and there are
- possibilities for decreasing the overhead further (for instance,
- by declaring a set of gateways all connected to the same networks
- and all in contact with each other as a "group" and having a
- single member of the group report status information for the
- entire group). And the link state scheme gives needed
- flexibility for other internet routing problems, for instance
- extended routing (including access control).
-
- - 10 -
-
-
-
- 2) The rest of the partitioning design, all of which is minor,
- including:
- a) having gateways detect a partition and compute routing
- accordingly
- b) having gateways receiving packets for a partitioned network
- duplicate the packets
- c) having gateways on the partitioned net fill in the
- partition name on outgoing packets
- d) extra packets or having hosts communicating with a host on
- a partitioned net copy the partition name from incoming
- packets
-
- 3) For the flying packet radios, a single virtual net number.
-
-
- XII. CONCLUSIONS
-
- A link state scheme, as originally presented in PRTN 242,
- modified as presented in part IV of this paper should be the
- basis of internet routing.
-
- The internet header should include a field long enough for a
- gateway ID, for the purpose of specifying a partition name. A
- partition name is the ID of any member gateway on that partition.
-
- The first gateway that handles a packet checks to see if it is
- addressed to a partitioned network. If so, and if the partition
- name field in the internet header is blank, the gateway
- duplicates the packet for each partition, and sends a copy to
- each partition (filling in the partition name on each copy).
-
- When a host receives packets with a partition name filled in, it
- can copy that information in a per host table, being careful to
- erase that information if packets fail to reach the destination.
- Hosts that choose not to implement that will cause nothing more
- serious than duplicated packets.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 11 -
-
-